17 research outputs found

    Machine learning in astronomy

    Get PDF
    The search to find answers to the deepest questions we have about the Universe has fueled the collection of data for ever larger volumes of our cosmos. The field of supernova cosmology, for example, is seeing continuous development with upcoming surveys set to produce a vast amount of data that will require new statistical inference and machine learning techniques for processing and analysis. Distinguishing between real objects and artefacts is one of the first steps in any transient science pipeline and, currently, is still carried out by humans - often leading to hand scanners having to sort hundreds or thousands of images per night. This is a time-consuming activity introducing human biases that are extremely hard to characterise. To succeed in the objectives of future transient surveys, the successful substitution of human hand scanners with machine learning techniques for the purpose of this artefact-transient classification therefore represents a vital frontier. In this thesis we test various machine learning algorithms and show that many of them can match the human hand scanner performance in classifying transient difference g, r and i-band imaging data from the SDSS-II SN Survey into real objects and artefacts. Using principal component analysis and linear discriminant analysis, we construct a grand total of 56 feature sets with which to train, optimise and test a Minimum Error Classifier (MEC), a naive Bayes classifier, a k-Nearest Neighbours (kNN) algorithm, a Support Vector Machine (SVM) and the SkyNet artificial neural network

    Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution

    Get PDF
    Models of protein evolution currently come in two flavors: generalist and specialist. Generalist models (e.g. PAM, JTT, WAG) adopt a one-size-fits-all approach, where a single model is estimated from a number of different protein alignments. Specialist models (e.g. mtREV, rtREV, HIVbetween) can be estimated when a large quantity of data are available for a single organism or gene, and are intended for use on that organism or gene only. Unsurprisingly, specialist models outperform generalist models, but in most instances there simply are not enough data available to estimate them. We propose a method for estimating alignment-specific models of protein evolution in which the complexity of the model is adapted to suit the richness of the data. Our method uses non-negative matrix factorization (NNMF) to learn a set of basis matrices from a general dataset containing a large number of alignments of different proteins, thus capturing the dimensions of important variation. It then learns a set of weights that are specific to the organism or gene of interest and for which only a smaller dataset is available. Thus the alignment-specific model is obtained as a weighted sum of the basis matrices. Having been constrained to vary along only as many dimensions as the data justify, the model has far fewer parameters than would be required to estimate a specialist model. We show that our NNMF procedure produces models that outperform existing methods on all but one of 50 test alignments. The basis matrices we obtain confirm the expectation that amino acid properties tend to be conserved, and allow us to quantify, on specific alignments, how the strength of conservation varies across different properties. We also apply our new models to phylogeny inference and show that the resulting phylogenies are different from, and have improved likelihood over, those inferred under standard models

    Cosmic Rates of Black Hole Mergers and Pair-Instability Supernovae from Chemically Homogeneous Binary Evolution

    Get PDF
    This article has been accepted for publication in Monthly Notices of the Royal Astronomical Society. © 2020 The Author(s). Published by Oxford University Press on behalf of the Royal Astronomical Society. All rights reserved.During the first three observing runs of the Advanced gravitational-wave detector network, the LIGO/Virgo collaboration detected several black hole binary (BHBH) mergers. As the population of detected BHBH mergers grows, it will become possible to constrain different channels for their formation. Here we consider the chemically homogeneous evolution (CHE) channel in close binaries, by performing population synthesis simulations that combine realistic binary models with detailed cosmological calculations of the chemical and star-formation history of the Universe. This allows us to constrain population properties, as well as cosmological and aLIGO detection rates of BHBH mergers formed through this pathway. We predict a BHBH merger rate at redshift zero of 5.8Gpc−3yr−15.8 \hspace{1mm} \textrm{Gpc}^{-3} \textrm{yr}^{-1} through the CHE channel, to be compared with aLIGO's measured rate of 53.2−28.2+55.8Gpc−3yr−1{53.2}_{-28.2}^{+55.8} \hspace{1mm} \text{Gpc}^{-3}\text{yr}^{-1}, and find that eventual merger systems have BH masses in the range 17−43M⊙17 - 43 \hspace{1mm} \textrm{M}_{\odot} below the pair-instability supernova (PISN) gap, and >124M⊙>124 \hspace{1mm} \textrm{M}_{\odot} above the PISN gap. We further investigate the effects of momentum kicks during black hole formation, calculate cosmological and magnitude limited PISN rates and investigate the effects of high-redshift deviations in the star formation rate. We find that momentum kicks tend to increase delay times of BHBH systems, and our magnitude limited PISN rate estimates indicate that current deep surveys should be able to detect such events. Lastly, we find that our cosmological merger rate estimates change by at most ∼8%\sim 8\% for mild deviations of the star formation rate in the early Universe, and by up to ∼40%\sim 40\% for extreme deviations.Peer reviewe

    Selecting the larger Pandit alignments.

    No full text
    <p>Each blue dot represents an alignment in the Pandit database. The green region covers the alignments used in the training set, and the thin red region covers those in the test set.</p

    NNMF basis matrices.

    No full text
    <p>The set of NNMF basis matrices obtained for ranks ranging from 1 to 5. Amino acids are ordered according to their Stanfel classification <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0028898#pone.0028898-Stanfel1" target="_blank">[25]</a>. Rates are indicated in grayscale, with pure white being a rate of zero and pure black being the maximum rate in the matrix.</p